Goto

Collaborating Authors

 integro-differential equation


Global Convergence of Adjoint-Optimized Neural PDEs

Riedl, Konstantin, Sirignano, Justin, Spiliopoulos, Konstantinos

arXiv.org Artificial Intelligence

Many engineering and scientific fields have recently become interested in modeling terms in partial differential equations (PDEs) with neural networks, which requires solving the inverse problem of learning neural network terms from observed data in order to approximate missing or unresolved physics in the PDE model. The resulting neural-network PDE model, being a function of the neural network parameters, can be calibrated to the available ground truth data by optimizing over the PDE using gradient descent, where the gradient is evaluated in a computationally efficient manner by solving an adjoint PDE. These neural PDE models have emerged as an important research area in scientific machine learning. In this paper, we study the convergence of the adjoint gradient descent optimization method for training neural PDE models in the limit where both the number of hidden units and the training time tend to infinity. Specifically, for a general class of nonlinear parabolic PDEs with a neural network embedded in the source term, we prove convergence of the trained neural-network PDE solution to the target data (i.e., a global minimizer). The global convergence proof poses a unique mathematical challenge that is not encountered in finite-dimensional neural network convergence analyses due to (i) the neural network training dynamics involving a non-local neural network kernel operator in the infinite-width hidden layer limit where the kernel lacks a spectral gap for its eigenvalues and (ii) the nonlinearity of the limit PDE system, which leads to a non-convex optimization problem in the neural network function even in the infinite-width hidden layer limit (unlike in typical neural network training cases where the optimization problem becomes convex in the large neuron limit). The theoretical results are illustrated and empirically validated by numerical studies.


Exact Dynamics of Multi-class Stochastic Gradient Descent

Collins-Woodfin, Elizabeth, Seroussi, Inbar

arXiv.org Machine Learning

We develop a framework for analyzing the training and learning rate dynamics on a variety of high- dimensional optimization problems trained using one-pass stochastic gradient descent (SGD) with data generated from multiple anisotropic classes. We give exact expressions for a large class of functions of the limiting dynamics, including the risk and the overlap with the true signal, in terms of a deterministic solution to a system of ODEs. We extend the existing theory of high-dimensional SGD dynamics to Gaussian-mixture data and a large (growing with the parameter size) number of classes. We then investigate in detail the effect of the anisotropic structure of the covariance of the data in the problems of binary logistic regression and least square loss. We study three cases: isotropic covariances, data covariance matrices with a large fraction of zero eigenvalues (denoted as the zero-one model), and covariance matrices with spectra following a power-law distribution. We show that there exists a structural phase transition. In particular, we demonstrate that, for the zero-one model and the power-law model with sufficiently large power, SGD tends to align more closely with values of the class mean that are projected onto the "clean directions" (i.e., directions of smaller variance). This is supported by both numerical simulations and analytical studies, which show the exact asymptotic behavior of the loss in the high-dimensional limit.


Learning Swarm Interaction Dynamics from Density Evolution

Mavridis, Christos, Tirumalai, Amoolya, Baras, John

arXiv.org Artificial Intelligence

We consider the problem of understanding the coordinated movements of biological or artificial swarms. In this regard, we propose a learning scheme to estimate the coordination laws of the interacting agents from observations of the swarm's density over time. We describe the dynamics of the swarm based on pairwise interactions according to a Cucker-Smale flocking model, and express the swarm's density evolution as the solution to a system of mean-field hydrodynamic equations. We propose a new family of parametric functions to model the pairwise interactions, which allows for the mean-field macroscopic system of integro-differential equations to be efficiently solved as an augmented system of PDEs. Finally, we incorporate the augmented system in an iterative optimization scheme to learn the dynamics of the interacting agents from observations of the swarm's density evolution over time. The results of this work can offer an alternative approach to study how animal flocks coordinate, create new control schemes for large networked systems, and serve as a central part of defense mechanisms against adversarial drone attacks.


Advanced Physics-Informed Neural Network with Residuals for Solving Complex Integral Equations

Moghaddam, Mahdi Movahedian, Parand, Kourosh, Kheradpisheh, Saeed Reza

arXiv.org Artificial Intelligence

Integral and integro-differential equations are foundational tools in many fields of science and engineering, modeling a wide range of phenomena from physics and biology to economics and engineering systems [1-3]. These equations describe processes that depend not only on local variables but also on historical or spatial factors, making them essential for understanding systems with memory effects, hereditary characteristics, and long-range interactions [4-7]. Despite their importance, solving integral and integro-differential equations is a challenging task due to the complexity of their integral operators, especially when extended to multi-dimensional or fractional forms [2, 8]. Classical numerical methods, such as finite difference [9, 10], finite element [11, 12], and spectral methods [13-15], have long been used to approximate solutions to these equations. However, these methods often suffer from several limitations.


Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations

Heredia, Carlos

arXiv.org Artificial Intelligence

In this paper, we propose a continuous-time formulation for the AdaGrad, RMSProp, and Adam optimization algorithms by modeling them as first-order integro-differential equations. We perform numerical simulations of these equations to demonstrate their validity as accurate approximations of the original algorithms. Our results indicate a strong agreement between the behavior of the continuous-time models and the discrete implementations, thus providing a new perspective on the theoretical understanding of adaptive optimization methods. The pursuit of finding the global minima of such functions presents a significant challenge due to the inherent complexity and non-convexity of the landscape. Gradient Descent (GD) remains one of the most prominent algorithms for minimizing the function f by iteratively finding the optimal parameters θ Boyd & Vandenberghe (2004). It operates by adjusting the parameters in the direction of the steepest descent of f with a fixed step size α (learning rate). At each iteration, the algorithm computes the gradient of f with respect to θ, guiding the parameter updates to minimize f progressively Rumelhart et al. (1986): θ The continuous nature of these methods permits a more direct application of differential equation techniques. For readers interested in a continuous description of the stochastic method, we refer to Sirignano & Spiliopoulos (2017). Adaptive optimization methods such as AdaGrad Duchi et al. (2011) and RMSProp Hinton (2012) have been pivotal in advancing gradient-based algorithms.


PINNIES: An Efficient Physics-Informed Neural Network Framework to Integral Operator Problems

Aghaei, Alireza Afzal, Moghaddam, Mahdi Movahedian, Parand, Kourosh

arXiv.org Artificial Intelligence

This paper introduces an efficient tensor-vector product technique for the rapid and accurate approximation of integral operators within physics-informed deep learning frameworks. Our approach leverages neural network architectures to evaluate problem dynamics at specific points, while employing Gaussian quadrature formulas to approximate the integral components, even in the presence of infinite domains or singularities. We demonstrate the applicability of this method to both Fredholm and Volterra integral operators, as well as to optimal control problems involving continuous time. Additionally, we outline how this approach can be extended to approximate fractional derivatives and integrals and propose a fast matrix-vector product algorithm for efficiently computing the fractional Caputo derivative. In the numerical section, we conduct comprehensive experiments on forward and inverse problems. For forward problems, we evaluate the performance of our method on over 50 diverse mathematical problems, including multi-dimensional integral equations, systems of integral equations, partial and fractional integro-differential equations, and various optimal control problems in delay, fractional, multi-dimensional, and nonlinear configurations. For inverse problems, we test our approach on several integral equations and fractional integro-differential problems. Finally, we introduce the pinnies Python package to facilitate the implementation and usability of the proposed method.


UniFIDES: Universal Fractional Integro-Differential Equation Solvers

Saadat, Milad, Mangal, Deepak, Jamali, Safa

arXiv.org Artificial Intelligence

The development of data-driven approaches for solving differential equations has been followed by a plethora of applications in science and engineering across a multitude of disciplines and remains a central focus of active scientific inquiry. However, a large body of natural phenomena incorporates memory effects that are best described via fractional integro-differential equations (FIDEs), in which the integral or differential operators accept non-integer orders. Addressing the challenges posed by nonlinear FIDEs is a recognized difficulty, necessitating the application of generic methods with immediate practical relevance. This work introduces the Universal Fractional Integro-Differential Equation Solvers (UniFIDES), a comprehensive machine learning platform designed to expeditiously solve a variety of FIDEs in both forward and inverse directions, without the need for ad hoc manipulation of the equations. The effectiveness of UniFIDES is demonstrated through a collection of integer-order and fractional problems in science and engineering. Our results highlight UniFIDES' ability to accurately solve a wide spectrum of integro-differential equations and offer the prospect of using machine learning platforms universally for discovering and describing dynamical and complex systems.

  derivative, equation, integro-differential equation, (14 more...)
2407.01848

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Collins-Woodfin, Elizabeth, Seroussi, Inbar, Malaxechebarría, Begoña García, Mackenzie, Andrew W., Paquette, Elliot, Paquette, Courtney

arXiv.org Machine Learning

In deterministic optimization, adaptive stepsize strategies, such as line search (see [39], therein), AdaGrad-Norm [55], Polyak stepsize [46], and others were developed to provide stability and improve efficiency and adaptivity to unknown parameters. While the practical benefits for deterministic optimization problems are well-documented, much of our understanding of adaptive learning rate strategies for stochastic algorithms are still in their infancy. There are many adaptive learning rate strategies used in machine learning with many design goals. Some are known to adapt to SGD gradient noise while others are robust to hyper-parameters (e.g., [4, 59]). Theoretical results for adaptive algorithms tend to focus on guaranteeing minimax-optimal rates, but this theory is not engineered to provide realistic performance comparisons; indeed many adaptive algorithms are minimax-optimal, and so more precise statements are needed to distinguish them. For instance, the exact learning rates (or rate schedules) to which these strategies converge are unknown, nor their dependence on the geometry of the problem. Moreover, we often do not know how these adaptive stepsizes compare with well-tuned constant or decaying fixed learning rate stochastic gradient descent (SGD), which can be viewed as a cost associated with selecting the adaptive strategy in comparison to tuning by hand.


Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

Collins-Woodfin, Elizabeth, Paquette, Courtney, Paquette, Elliot, Seroussi, Inbar

arXiv.org Artificial Intelligence

We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models and multi-index models (e.g. logistic regression, phase retrieval) with general data-covariance. In particular, we demonstrate a deterministic equivalent of SGD in the form of a system of ordinary differential equations that describes a wide class of statistics, such as the risk and other measures of sub-optimality. This equivalence holds with overwhelming probability when the model parameter count grows proportionally to the number of data. This framework allows us to obtain learning rate thresholds for stability of SGD as well as convergence guarantees. In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient (homogenized SGD) which allows us to analyze the dynamics of general statistics of SGD iterates. Finally, we illustrate this theory on some standard examples and show numerical simulations which give an excellent match to the theory.


Temporal Difference Learning for High-Dimensional PIDEs with Jumps

Lu, Liwei, Guo, Hailong, Yang, Xu, Zhu, Yi

arXiv.org Artificial Intelligence

In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning. We introduce a set of Levy processes and construct a corresponding reinforcement learning model. To simulate the entire process, we use deep neural networks to represent the solutions and non-local terms of the equations. Subsequently, we train the networks using the temporal difference error, termination condition, and properties of the non-local terms as the loss function. The relative error of the method reaches O(10^{-3}) in 100-dimensional experiments and O(10^{-4}) in one-dimensional pure jump problems. Additionally, our method demonstrates the advantages of low computational cost and robustness, making it well-suited for addressing problems with different forms and intensities of jumps.